A ricle Prospects for Building Large Timetrees Using Molecular Data with Incomplete Gene Coverage among Species

نویسندگان

  • Alan Filipski
  • Oscar Murillo
  • Anna Freydenzon
  • Koichiro Tamura
  • Sudhir Kumar
  • Claudia Russo
چکیده

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species–gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prospects for building large timetrees using molecular data with incomplete gene coverage among species.

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness o...

متن کامل

Fast and Accurate Estimates of Divergence Times from Big Data.

Ongoing advances in sequencing technology have led to an explosive expansion in the molecular data available for building increasingly larger and more comprehensive timetrees. However, Bayesian relaxed-clock approaches frequently used to infer these timetrees impose a large computational burden and discourage critical assessment of the robustness of inferred times to model assumptions, influenc...

متن کامل

Data Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences

 Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...

متن کامل

Ribulose-1, 5-Bisphosphate Carboxylase/Oxygenase Gene Sequencing in Taxonomic Delineation of Padina Species in theNorthern Coast of the Persian Gulf, (IRAN)

Taxonomic study of the genus Padina (Dictyotales, Phaeophyceae) from the Persian Gulf coast was conducted based on morphology and molecular phylogenetic analyses using chloroplast encoded large subunit RuBisCo (rbcL) gene sequences. Detailed descriptions of each species found in this study are described. Several morphological characters, such as number of cell layers composing the thallus, pr...

متن کامل

Genetic Diversity of Marrubium Species from Zagros Region (Iran), Using Inter Simple Sequence Repeat Molecular Marker

This study concerns the genetic diversity and taxonomic status of Marrubium species from central and south-west of Zagros region, Iran. It is investigated by Inter-Simple Sequence Repeat analysis. A total of 68 accessions from five Marrubium species were collected from their natural habitats. Molecular analysis was approved with 17 primers, of which 12 were carried out in the reaction mixture. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014